S3
Query
The Data Service is configured by the Qarbine administrator with an optional default bucket. This is set in the general purpose “Database” field. That value, or one chosen from a “Database” dropdown, provides context on which S3 bucket is being accessed.
Qarbine uses the generic term “collections” which maps to S3 “objects”. When listing these items in drop downs the following rules apply.
- Any '/' characters for folder delimiters get replaced with underscores because SQL does not like '/'.
- SQL also does not like '.' in the table name. Any file extension is removed and intermediate folder dots are converted to '_' characters. The object name 'data/foo/animals.json' shows up as 'data_foo_animals'. As a result a bucket with foo.csv and foo.json will list “foo” twice.
SQL Oriented Queries
Qarbine provides a SQL oriented querying interface to the S3 objects which are JSON, CSV, TSV, or XLSX based. The object name is used as the SQL “table” name. Note that in all cases the complete object is read and then processed. For large files a more optimized interaction may be preferred.
The JSON content has to be an array of objects. Here is a sample query,
select * from portfolio.json
The result is shown below.
Here is a query which sorts the answer set.
select * from portfolio.json order by cash asc
The result is shown below.
Here is a sample query with a WHERE clause.
select * from portfolio.json where isIRA = false
The result is shown below.
Table names ending with “.csv” are assumed to be comma separated value (CSV) files. Those ending in “.tsv” are tab separated value (TSV) files. The first row contains the field names.
The following column names may not be referenced: ‘target’ and ‘value’.
Virtual Queries
| Request | Description |
|---|---|
| readTextFile OBJECT | Read the contents of the references object. The result is an array with a single element. The element has a field ‘content’ with the text. |
| readJsonFile OBJECT | Read the contents of the references object and convert the text into a JSON object. If the JSON object is an array then it is left as-is. Otherwise the result is an array with a single element. The element has a field ‘content’ with the JSON object. |
| readBinaryFile OBJECT | Read the contents of the references object. The result is an array with a single element. The element has a field ‘content’ with the binary value. |
| list buckets | List the buckets accessible using the active data service. |
| list tables | List the objects in the bucket. This query uses the Qarbine general purpose “tables” term. |
Troubleshooting
When querying JSON or CSV files, you can always view the contents using the “readTextFile XXX” query. Be aware the complete file is retrieved so it may be best to avoid this with really large files.